(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 




(19) World Intellectual Property Organization 

Internationa] Bureau 

(43) International Publication Date (10) International Publication Number 

30 August 2001 (30.08.2001) pCT WO 01/63835 Al 



(51) International Patent Classification^: H04L 9/36 

(21) International Application Number: PCTAJSO 1/0554 1 

(22) International Filing Date: 21 February 2001 (2K02.2001) 

(25) Filing Language: Hnglish 

(26) Publication Language: English 

(30) Priority Data: 

60/1 83,727 21 February 2000 (2 i .02.2000) US 

60/1 83,728 21 February 2000 (2 1 .02.2000) US 



(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, OR, CU, CZ, 
DH, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, KR, 
liU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, 
LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, 
NO, NZ, PL, Pr, RO, RU, SD, SB, SG, SI, SK, SL, TJ, TM, 
TR, TT, 1Z, UA, UG, UZ, VN, YU, ZA, ZW. 

(84) Designated States (regional)', ARIPO patent (Gil, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, H, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE, TR), OAPl patent (BE, BJ, CF, 
CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 



(71) Applicant: CLICKSAFE.COM LLC [US/US]: 40 East Published: 

Reading Road, Edison, NJ 08817 (US). _ international search report 

— before the expiration of the time limit for amending the 

(72) Inventor: LIANG, Yufeng; B3 Sutton Drive, Matawan, claims and to be republished in the event of receipt of 
NJ 07747 (US). amendments 

(74) Agents: MORRIS, Francis, E. et al.; Pennie & Edmonds For two-letter codes and other abbreviations, refer to the "Guid- 

LLP, 1 1 55 Avenue of the Americas, New York, NY 1 0036 ance Notes on Codes and Abbreviations " appearing at the begin- 

(US). ning of each regular issue of the PCT Gazette. 



(54) Title: SYSTEM AND METHOD I^OR IDEN Til'YING AND BLOCKING PORNOGRAPHIC AND OTHER WEB CON- 
TENT ON THE INTERNET 




ANSWER 



cuEm 




UPDATE 




(IE, NDCAPE) 




URL 






CACHE 


J 





UPDATE 




2? (57) Abstract: A system and method are disclosed for identifying and blocking unacceptable web content, including pornographic 
web content. In a preferred embodiment, the system comprises a proxy server (14) c onnected between a client (16) and the Internet 
'"^^ that checks a requested URl^ against a block list (1 8) that may include URLs identified by a web spider. The proxy server requests 
^ if the URL is not on the block list, the web content. When the web content is received, the proxy server processes its text content by 
using filterengine (22) and compares tlie processing results using a thresholder (508), If necessary, the proxy server then processes 
Q the image content of the retrieved web content to determine if it cxjmpriscs skin tones and textures by using texture filter (712) and 
^ tone filler (71 0). Based on these processing results, the proxy server may either block the retrieved web content or permit user access 
^ to it. Also disclosed is a system and method for inserting advertisements into retrieved web content. 
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SYSTEM AND METHOD FOR IDENTIFYING AND BLOCKING 
PORNOGRAPHIC AND OTHER WEB CONTENT ON THE INTERNET 

This applicalion claims priority to United Slates Provisional Application No. 60/183, 
5 727 and United States Provisional Application No. 60/183,728, each of which is hereby 
incorporated by reference. 

Background of the Invention 
Tools for identifying and blocking pornographic websites on the Internet are known in 
10 the art. Typically, these tools comprise a "block" list comprising URLs of known 

pornographic sites. When an unauthorized user attempts to retrieve web content from a site 
on the block list, the user's browser blocks the request. 

It is difficult, however, to keep the block list cunent because objectionable web sites 
are constantly being added to the Internet. Moreover, these prior art tools fail to block sites 
15 that are not on the block list. 

Summary of the Invention 
A system and method are disclosed for identifying and blocking unacceptable web 
content, including pornographic web content. In a preferred embodiment, the system 
20 comprises a proxy server connected between a client and the Internet that processes requests 
for web content. The proxy server checks the requested URL against a block list that may 
include URLs identified by a web spider. If the URL is not on the block list, the proxy server 
requests the web content. 

When the web content is received, the proxy server processes its text content and 
25 compares the processing results using a thresholder. If necessary, the proxy server then 
processes the image content of the retrieved web content to determine if it comprises skin 
tones and textures. Based on these processing results, the proxy server may either block the 
retrieved web content or permit user access to it. 

Also disclosed is a system and method for inserting advertisements into retrieved web 
30 content. In a preferred embodiment, the system inserts html content that may comprise a 
hyperlink into the top portion of the retrieved web content. 
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Brief Description oFthe Drawings 
The above summary of the invention will be better understood when taken in 
conjunction with the following detailed description and accompanying drawings, in which: 
Fig. 1 is a block diagram of a first preferred embodiment of the present system; 
5 Fig. 2 is a block diagram of a second prefened embodiment of the present system; 

Fig. 3 is a flow diagram depicting a piefened process implemented by the 
embodiments shown in Figs. 1 and 2; 

Fig. 4A is a flow diagram depicting a preferred embodiment of a text analysis 
algorithm employed by the present system; 
10 Fig. 4B is a prefeired embodiment of a lexicon of words and values assigned to them 

employed by the present system; 

Fig. 5 is a block diagram of a prefeired text analysis engine of the present system; 
Fig. 6 is a flow diagram depicting a prefened embodiment of an algorithm for 
determining the h values used by the text analysis engine of Fig. 5; 
15 Fig. 7 is a block diagram of a prefeired image analysis engine of the present system; 

Fig. 8A is a flow diagram depicting a prefeired filtering algorithm for use in the 
present system; 

Fig. 8B depicts an image area to be filtered using the filtering algorithm depicted in 
Fig. 8A; 

20 Fig. 9 is a flow chart depicting a preferred algorithm employed by a web spider to 

create a list of unacceptable web sites; and 

Fig. 10 is a flow chart depicting a preferred algorithm for inserting advertisements 
into retrieved web content. 

25 Detailed Description of the Preferred Embodiments 

Fig. 1 is a block diagram of a first preferred embodiment of the present system. As 
shown in Fig. I , the system preferably comprises a proxy server 14 that is designed to receive 
URL requests for web content from a client 16. Typically, client 16 will be one of many 
clients connected to a network (not shown). Each request for web content by a client 16 that 

30 is transmitted over the network is forwarded to proxy server 14 for processing. 
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Proxy server 14 determines whether the request is permissible (as described in more 
detail below) and. if it is, forwards the request to an appropriate web site (not shown) via 
world-vvide-vveb 12. When a web page or other content is received from the web site, proxy 
server 14 determines whether the content is acceptable, and, if it is, forwards the web page to 
5 client 16. 

In a prefened embodiment, a URL is deemed acceptable if it does not identify a 
pornographic web site. Similarly, a web page or other web content is acceptable if it does not 
comprise pornographic content. 

As further shown in Fig. 1 , the system also preferably comprises a URL cache 1 8 that 

10 stores a list of impermissible URLs. In addition, the system preferably comprises a local 

word list 20 and a filter engine 22 which are used by proxy server 14 to identify pornographic 
material, as described in more detail below. 

In a preferred embodiment, URL cache 18 may be populated in several ways. First, 
cache 18 may be populated with a list of known pornographic websites. Second, an 

15 authorized user may specify specific URLs that are unacceptable. Third, an authorized user 
may specify specific URLs that are acceptable (i.e., that should not be blocked, even though 
the remaining components of the system, described below, would identify the content as 
pornographic). Fourth, URL cache 18 may be populated by a web spider. A preferred 
embodiment of a particular web spider for use with the present system is described in more 

20 detail below. 

In a preferred embodiment, when a site is designated acceptable even though it 
comprises pornographic material, access to that site is limited to authorized individuals, such 
as, for example, the individual that designated the site acceptable. In this way, for example, 
an adult may designate certain sites acceptable and nevertheless block access to such sites by 

25 a child. 

Also shown in Fig. 1 is a main server 10. Main server 10 serves several functions 
including maintaining an updated list of unacceptable URLs, as described in more detail 
below. Typically, main server 10 is not co-located with proxy server 14 or client 16. Rather, 
it is typically located in a remote location from where it may provide updated unacceptable 
30 URL lists and other services to a plurality of proxy servers 14 and clients 16, 
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Fig. 2 is an alternative preferred ennbodiment of the present system. As shown in Fig. 
2, in this alternative ennbodiment, a client 16 may be connected directly to the Internet. In 
that event, URL cache 18, local word list 20, filter engine 22, as well as software 24 for using 
these modules is preferably resident in client 16. 
5 Fig. 3 is a flow diagram depicting a preferred process implemented by the 

embodiments shown in Figs. 1 and 2. For purposes of ease of description, the following 
description will refer primarily to the architecture disclosed in Fig. 1. It will be understood, 
however, that the same steps may be performed by corresponding components shown in Fig. 
2. In addition, it should be noted that although the steps in Fig. 3 are demonstrated as 
10 sequential, the text and image analysis engines described below may instead be designed to 
operate in parallel. In particular, parallel operation may be desirable when large processing 
resources are available, while the serial approach described below may be preferable when 
there is a desire to conserve processing resources. 

Turning to Fig. 3, in step 302, a user enters a URL onto the command line of his or 
15 her browser. In step 304, server 14 compares the URL to the list of unacceptable URLs 

stored in URL cache 18. If the URL is on the list, then server 14 blocks the user's request, 
and does not obtain the reqttested web page specified by the URL. 

Otherwise, if the URL is acceptable, server 14 transmits a URL request via web 12 to 
retrieve the requested web page (step 306). When the web page is returned, server 14 
20 conducts a text analysis of the text content of the web page (step 308). A prefeired 
embodiment of this text analysis is described in connection with Figs. 4-6. 

As shown in Fig. 4A, in step 402. server 14 first analyzes the text content of the 
retrieved web page and identifies every word or combination of words that it contains. It 
should be noted that this text search preferably includes not only text that is intended to be 
25 displayed to the user, but also html meia-text such as hyperlinks. It should also be noted that 
the identified words may include a substring within a longer word in the text. 

In step 404, server 14 compares each word and combination of words to a lexicon of 
words stored in local word list 20. A prefened embodiment of lexicon 20 is shown in Fig. 
4B. 

30 It should be noted that each of the words in the lexicon shown in Fig. 4B has two 

values following it, and that those words associated with the preferred embodiment being 
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discussed presently are those that have a "0" as their second value. These words are 
associated with pornography and are utilized by the system to identify pornographic material, 
as described below. Words having a value other than '*0'' as their second value are preferably 
associated with other concepts or categories of material, as described in more detail below. 
5 As further shown in Fig. 4B. each word or combination of words in local word list 20 

is also assigned a first value. In the prefened embodiment shown in Fig. 4B, this first value 
is between 0.25 and 8. If a word or combination of words found in the web content is in the 
lexicon, server 14 retrieves this assigned value for the word or combination of words. 

In step 406, server 14 uses the retrieved values as inputs to a text analysis engine for 
10 determining a score that is indicative of the likelihood that the retrieved web content is 
pornographic. In a prefened embodiment, the text analysis engine employs artificial 
intelligence to determine the likelihood that the retrieved web content is pornographic. A 
block diagram of a preferred text analysis engine is described in connection with Fig. 5. 

As shown in Fig. 5, text analysis engine 502 preferably comprises a plurality of inputs 
15 Xj, x>, . . x„ which are provided to multipliers 504. Each x^ represents the value retrieved 
from local word list 20 for the i'** word or combination of words found in the text of the 
retrieved web content. It should be noted that if a word in the lexicon appears n times in the 
text, the system preferably multiplies the retrieved value assigned to the word by n and 
supplies this product as input Xj to text analysis engine 502. 
20 Each multiplier 504 multiplies one input x^ by a predetermined factor h^. A preferred 

method for determining factors h,, h., . . h,, is described below. 

The outputs of multipliers 504 arc then added an adder 506. The output of adder 506 
is then provided to a thresholder 508 that implements a sigmoid function. The output of 
thresholder 508 therefore may be: I) less than a lower threshold; 2) between a lower threshold 
25 and an upper threshold; or 3) above the upper threshold. In a preferred embodiment, the 
lower threshold may be approximately 0.25 and the upper threshold may be approximately 
0.5. 

Returning to step 308 of Fig. 3. if the output of thresholder 508 is below the lower 
threshold, then server 14 concludes that the retrieved web content is not pornographic, and 
30 server 14 forwards the retrieved web content to client 16 (step 3 JO). If the output of 

thresholder 508 is above the upper threshold, then server 14 concludes that the retrieved web 
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content is pornographic, and server 14 "blocks'* the content by not sending it to chent 16 (step 
312). 

If, however, the output of thresholder 508 is above the lower threshold but below the 
upper threshold, then the system proceeds to step 314, where it analyzes the image content of 
5 the retrieved web content to determine whether the retrieved web content is pornographic. 

Before turning to step 314, however, a preferred embodiment for determining the h 
values used by the text analysis engine is first described in connection with Fig. 6. The steps 
in this preferred embodiment may, for example, be performed by main server 10, 

As shown in Fig. 6, in step 602 a plurality of web sites are shown to a plurality of 
10 people. With respect to each web site, each person states whether they consider the site's 

content to be pornographic or not. In step 604, the text content of each web page categorized 
by the plurality of people is analyzed to identify every word and combination of words that it 
contains. In step 606, each word and combination of words is compared to a lexicon of 
words, typically the same as the lexicon stored in local word list 20. If a word or combination 
15 of words found in the web content is in the lexicon, the assigned value for the word or 
combination of words is retrieved. 

In step 608, the system generates an equation for each person's opinion as to each web 
site. Specifically, the system generates the following set of equations: 

20 (x/** * h,) + (x,*'* + h,) + . . . (x^'^ * h,) = y, 

(x/-> *h») + (x/-^ + h,) + ...(x;-^^^hJ = y3 



25 



(x/'^' * h.) + {xr + h,) + . . . (x/^^ * h„) = y,. 



OR: 



30 



[X] * (HI = [Y] 
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10 



where: 

Xj is the value retrieved from the database for the i**" word or combination of words found in 
the text of the web site that is also in the lexicon, 

hi is the multiplier to be calculated for the i'^ word or combination of words found in the text 
of the web site that is also in the lexicon, and 

is either 0 or 1 depending on whether the j*^ person staled that he or she found the web site 
to be pornographic or not (0 = not pornographic). 

In step 610, the system solves this matrix of equations as: 

[H] = [X]-' [Y] 



It should be noted that when [X] does not have an inverse, a least square algorithm may 
instead be used as an approximation for the value of [X]'\ It should also be noted that if ihe 

15 X values are chosen wisely, then one may expect the h values to fall between 0.9 and 1.1. 

Returning to Fig. 3, recall that when the text analysis fails to conclusively demonstrate 
whether the retrieved web content is or is not pornographic, the system proceeds to step 314 
where an image analysis of the retrieved web content is performed. A prefeired embodiment 
for performing this image analysis is described in connection with Fig. 7. 

20 Fig. 7 is a block diagram of a preferred image analysis engine of the present system. 

As shown in Fig. 7, an image analysis engine 702 preferably comprises an adder 704 that 
receives the luminescence values for the red, green, and blue components of each pixel in the 
image and adds them to determine brightness (L=:R+G+B). A first divider 706 divides this 
sum by the pixel's red value to determine the normalized red value r, where r = R/(R+G-i-B). 

25 Similarly, a second divider 708 divides the brightness by the pixel's blue value to determine 
the normalized blue value b, where b = B/(R+G+B). Together, these two values, r and b, 
define the image tone for each pixel. 

Values r and b are supplied to a lone filter 710. Interestingly, it has been found that 
although images of human skin appear markedly different to viewers (e.g., white, black, 

30 yellow, brown, etc.), this difference is a function of the image brightness rather than the lone. 
In fact, it has been found that the distribution of pixels representing skin in an image is 



-7- 



wo 01/63835 PCT/USOI/05541 

'J 

relatively constant and follows a Gaussian distribution. Therefore, if the normalized red and 
blue values of all the pixels in an image are plotted on a graph of r vs. b, approximately 95% 
of pixels in the image that represent skin will fall within three standard deviations of the 
intersection of the mean values of r and b for pixels representing skin. Tone filter 710 
5 identifies pixels having r and b values within three standard deviations of the mean values of 
r and b and thus identifies portions of the image that are likely to include skin. 

Interestingly, it has been found that areas in an image representing skin typically have 
relatively low granularity. As a consequence, such areas of the image have little energy in the 
high spatial frequency. Areas of the image that include skin can therefore be distinguished by 
10 a.high-pass spatial filter. A prefeired embodiment for a texture filter 7 12 incoiporating such 
a high-pass spatial filter is described in connection with Figs. 8A-B. 

Texture filter 7 12 preferably employs multi-resolution median ring filtering to capture 
multi-resolution textural structure in the image being considered. A median filter may 
essentially be considered as a band-pass filter. Median filters are non-linear and, in most 
15 cases, are more robust against spiky image noise. Such filters capture edge pixels in multiple 
resolutions using a recursive algorithm, depicted in Fig. 8A. 

As shown in Fig. 8A, in step 802, the filter is set to a first ring radius r. in a preferred 
embodiment, r may be initially set to 13. In step 804, the image is filtered by replacing each 
pixel X(- in the image with the median of the values of eight pixels lying on a circle at radius r 
20 from pixel X;., as shown in Fig. 8B for the example of r=3. Thus, each pixel Xj. is replaced by: 
median(Xo, x,, x,, . . x,). This process is equivalent to conducting a non-linear band-pass 
filtering of the image. 

Jn step 806, it is determined whether r=I. If it is. then the process finishes at step 808. 
Otherwise, r is set to r-1 (step 810), and the process loops back to step 804 to again filler the 
25 image. Thus, filtering is recursively conducted until r is equal to 1. 

The resulting image is a smoothed version of the original image at various resolutions. 
Texture filter 712 then abstracts this resulting image from the original image to obtain the 
texture image. 

Once the texture image is obtained, a local 5X5 average *T* of the image is obtained 
30 for each pixel (i.j) and that average is compared to a threshold. If l(i j) > threshold, then (i,j) 
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is considered lo be a lextural pixel, and thus does not represent a skin area. Otherwise, if 
I(i j) < threshold, then (i J) is considered not a textural pixel. 

The outputs of tone filter 710 and texture filter 7 12 arc ANDed together by logical 
AND 714. If tone filter 710 identifies a pixel as having a skin tone and texture filter 712 
5 identifies a pixel as being a not textural pixeK then the output of logical AND 714 indicates 
that the pixel represents a skin area. 

As noted above, in a prefeired embodiment, URL cache 18 may be populated by a 
web spider 26. Web spider 26 may preferably be co-located with main server 10, and may 
periodically download lo server 14 an updated list 28 of URLs of pornographic web sites that 
10 it has compiled. Web spider 26 is preferably provided with a copy of the lexicon described 
above as well the text analysis engine and image analysis engine described above so as to 
permit it to recognize pomographic material. A prefeired embodiment of a particular web 
spider for use with the present system is now described in connection with Fig. 9. 

As shown in Fig. 9, in step 902, web spider 26 is provided with a first URL of a web 
15 site known to contain pornographic material. In a preferred embodimeni, the web site is one 
that comprises a plurality of links to both additional pages at the pornographic website, as 
well as other pornographic websites. 

In step 904, web spider 26 retrieves the web page associated with the first URL. In 
step 906, web spider 26 determines whether the retrieved web content contains pornographic 
20 material. If it does, then in step 908, web spider 26 adds the URL to list 28. 

In step 910, web spider 26 then retrieves another web page having a link in the first 
URL that it received. The process then returns to step 906, where web spider 26 again 
determines whether (he retrieved web page comprises pomographic material and, if it does, to 
step 908, where the URL of the pomographic page is added to list 28. 
25 This loop preferably continues until web spider 26 exhausts all web pages that link, 

directly or indirectly, to the first URL that it was provided. At that point, an additional *'seed'' 
URL may be provided to web spider 26. and the process may continue. 

In a prefeired embodiment, web spider 26 employs a widlh-firsi algorithm lo explore 
all linked web pages. Thus, for example, web spider 26 examines the web pages linked by 
30 direct links to the original URL before proceeding to drill down and examine additional pages 
linked lo those pages that link to the original URL. 



-9- 



wo 01/63835 



PCT/USOl/05541 



In a preferred embodimenl, if any page in a website is discovered as comprising 
pornographic nnateriai, all pages ''below** that page in the sitemap for the web site may be 
blocked. Pages above the pornographic page may preferably remain unblocked. 
Alternatively, an entire website may be designated unacceptable if any of its web pages are 
5 unacceptable. 

In a further preferred embodiment, a user may program the system lo filter out 
additional subject matter that is not, strictly speaking, pornographic. For example, if desired, 
the system may identify material relating to the concepts "bikini" or "lingerie*'. In the 
exemplary lexicon shown in Fig. 4B, for example, the words "lingerie,** "bra,** etc. are 
10 included in the lexicon and assigned a second value equal to "T* to identify them as 

belonging to the lingerie category. The system will then search for these terms during the text 
analysis and, either on the basis of text alone, or in combination with the image analysis, will 
identify and block web content directed to these subjects. 

In addition, a user may program the system to filter out subject matter relating to other 
15 areas such as hale, cults, or violence by adding terms relating to these concepts to the lexicon. 
The system will then search for these terms during the text analysis and block web content 
directed to these subjects. In the exemplary lexicon shown in Fig. 4B, for example, words 
associated with hate groups may be added to the lexicon and assigned a second value equal to 
2, words associated with cults may be added to the lexicon and assigned a second value equal 
20 to 3, and words associated with violence may be added to the lexicon and assigned a second 
value equal to 4. In addition, other words that do not necessarily correspond to a defined 
category (e.g., marijuana), may be added to the lexicon and assigned a second value equal, 
e.g., to 5, if they arc deemed likely to occur in objectionable material. 

In another aspect, the present system may also comprise the capability to insert 
25 advertisements into web pages displayed to a user. This preferred embodiment is described 
in connection with Fig. 10. As shown in Fig. 10, in step 1002, server 14 receives a web page 
from web 12. In step 1004, server J 4 determines whether the content of the web page is 
acceptable, as described in detail above. 

In step 1006, server 14 retrieves from memory an advertisement for insertion into the 
30 web page. In a preferred embodiment, this advertisement may include an html link lo be 
inserted near the top of the retrieved html web page. 
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In step J 008, server 14 inserts the advertisement into the retrieved web content. Thus, 
for example, after the ad is inserted, the retrieved web content may lake the following form: 



<html> 

5 <head> </head> 
<body> 

<a href = '*htlp://wwvv.__ .com"> Buy Golf Equipment! </a> 

</body> 
</html> 

10 

In a preferred embodiment, server 14 inserts the advertisement into the top portion of 
the retrieved web page, even if the retrieved web page comprises several frames. This may be 
accomplished, for example, with a short piece of Javascript. For example: 



15 <script.Javascript> 

if (self = top I self = top.frame[0]) 
insert (advertisement) 

While the invention has been described in conjunction with specific embodiments, it 
20 is evident that numerous alternatives, modifications, and variations will be apparent to those 
skilled in the art in light of the foregoing description. 
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Claims 

1 . A system for identifying possibly pornographic web sites comprising: 

a feature extraction module, the feature extraction module comprising; 
5 a first module for extracting the URL of the website from a request for web 

content; 

a second module for extracting text from text portions of the web page; 

a third module for extracting image portions from the web page that likely 
correspond to the skin of an individual; and 
10 a fusion module for evaluating the output from the feature extraction module and 

determining whether the web page comprises possibly pornographic content. 

2. The system of claim 1, further comprising a URL cache. 

15 3. The system of claim 2, wherein the URL cache comprises a list of unacceptable URLs. 

4. The system of claim 2, wherein the URL cache comprises a list of acceptable URLs. 

5. The system of claim 4, wherein the acceptable URLs are accessible only by authorized 
20 individuals. 

6. The system of claim 2, wherein the URL cache is populated by a web spider. 

7. The system of claim 1, further comprising a list of words found in pornographic material. 

25 

8. The system of claim 7. wherein each word in the list is assigned a value. 

9. The system of claim 8, further comprising a text analysis engine. 

30 10. The system of claim 9, wherein the text analysis engine multiplies the assigned value for 
every word on the list that is also in the text portion of a web page by an associated value, 
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sums together the products, and supplies the sum to a thresholder implementing a sigmoid 
function. 

11. The system of claim further comprising an image analysis engine. 

5 

12. The system of claim 11, further comprising a tone filter. 

13. The system of claim 1 1 , further comprising a texture filter. 

10 14. A method for inserting an advertisement into retrieved web content, comprising: 
retrieving web content; 
retrieving an advertisement; 

inserting the advertisement into the web content in a computer that is either the client 
computer that requested the web content or a server connected to the same LAN or WAN as 
1 5 the computer that requested the web content. 

15. The method of claim 14. wherein the advertisement comprises html content. 

16, The method of claim 14, further comprising the step of checking the web content to 
20 determine if it is pornographic before permitting the web content to be displayed to a user. 
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"areola". 0.12. 0 
"clitoris". 2.00. 0 
"condom". 0.12, 0 
"diaphragm". 0.12. 0 
"dominance". 0.12, 0 
"E-zines". 8.00. 0 
"foreskin". 0.12. 0 
"genitalia". 0.25. 0 
"hymen". 2.00, 0 
"kinky".. 4. 00. 0 
"labia". 2.00. 0 
"nasty". 0.50. 0 
"seductive", 0.50. 0 
"submission". 0.12. 0 
"swinging". 0.50. 0 
"urine", 0.12, 0 
"urination". 0.50. 0 
"adultcheck". 8.00, 0 
"adultsights". 8.00, 0 
"anal", 8.00. 0 
"analingus", 8.00, 0 
"ass". 2.00. 0 
"asshole", 8.00, 0 
"beastiality", 8.00. 0 
"bestial", 8.00. 0 
"bestiality". 8.00. 0 
"bisexual", 1.00. 0 
"blowjob". 8.00. 0 
"blowjobs". 8.00. 0 
"bomb". 0.12, 0 
"bondage", 8.00. 0 
"boob". 2.00. 0 
"buttfucking". 8.00, 0 
"cannibalism". 8.00. 0 
"clit". 2.00. 0 
"cock". 8.00. 0 
"cocks". 8.00, 0 
"coitus". 8.00. 0 
"copulate". 4.00. 0 
"copulation". 4.00. 0 
"cum". 8.00. 0 
"cumshot". 8.00. 0 
"cumshots". 8.00. 0 
"cunnilingus". 8.00. 0 
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"cunt". 8.00. 0 
"cunts". 8.00, 0 
"decadence". 2.00. 0 
"dicks". 8.00. 0 
"dildo". 8.00. 0 
"dildos". 8.00. 0 
"doobie". 8.00. 0 
"drugs". 0.12, 0 
"ejaculate", 2.00, 0 
"ejaculation". 2.00. 0 
"erection", 4.00. 0 
"erotic". 8.00. 0 
"erotica". 8.00. 0 
"exhibitionism". 8.00, 0 
"exhibitionist". 8.00. 0 
"exhibitionists", 8.00. 0 
"felching". 8.00, 0 
"fellatio", 8.00, 0 
"fetish". 2.00. 0 
"fetishes". 2.00. 0 
"fistfuck". 8.00. 0 
"fisting". 8.00. 0 
"flesh". 1.00. 0 
"frottage". 8.00, 0 
"fuck". 8.00. 0 
"fucked". 8.00. 0 
"fuckers". 8.00. 0 
"fucking". 8.00. 0 
"gangbang", 8.00, 0 
"gerbiling". 8.00. 0 
"groupsex". 8.00, 0 
"hard-on", 8.00, 0 
"hardcore". 8.00. 0 
"harden", 8.00. 0 
"heterosexual". 1.00, 0 
"homosexual", 1.00, 0 
"homiest". 4.00, 0 
"horny". 4.00. 0 
"incest", 2.00. 0 
"intercourse". 8.00, 0 
"jisni". 8.00. 0 
"kinky", 8.00. 0 
"lesbian", 1.00, 0 
"lezbos". 2.00. 0 
"lusting". 2.00. 0 
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"masochism". 8.00. 0 
"masturbate". 2.00. 0 
"masturbation". 2:00. 0 
"nude". 4.00. 0 
"nudes". 4.00. 0 
"nudity". 4.00. 0 
"nympho". 8.00. 0 
"nymphomania". 8.00, 0 
"nymphomaniac". 8.00. 0 
"obsex". 8.00. 0 
"orgasm". 4.00. 0 
"orgy". 8.00. 0 
"penis". 2.00, 0 
"perverse". 1-00. 0 
"perversion". 1.00. 0 
"perverted", 1.00, 0 
"porn". 0.50. 0 
"porno", 0.50. 0 
"pornography". 0.50. 0 
"prick". 1.00, 0 
"prostitution". 0.50. 0 
"pussies". 8.00. 0 
"pussy". 8.00. 0 
"rape". 0.50. 0 
"rimming". 0.12. 0 
"sadism". 8.00. 0 
"sadomasochism", 8.00, 0 
"s&m". 8.00, 0 
"s/m", 8.00. 0 
"screwing", 8.00, 0 
"sexy". 0.25. 0 
"sexual". 0.25, 0 
"shemales". 4.00, 0 
"slut". 2.00. 0 
"sluts", 2.00. 0 
"smut", 4.00, 0 
"snatch", 0.12, 0 
"snatches". 0.12. 0 
"sodomy". 2.00. 0 
"spank", 4.00. 0 
"spunk", 8.00. 0 
"suck". 2.00. 0 
"threesome". 0.25. 0 
"tit". 4.00. 0 
"tits". 4.00. 0 
"transexuality". 4.00. 0 
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"transvestite". 4.00. 0 
"twat", 8.00. 0 
"vibrator". 8.00, 0 
"voyeur". 8.00. 0 
"voyeurism". 8.00. 0 
"vulva". 8.00. 0 
"whore". 8.00. 0 
"XXX". 8.00. 0 
"zoophile". 8.00. 0 
"zoophilia". 8.00. 0 
"asb". 8.00. 0 
"asw", 8.00. 0 
"ass". 4.00. 0 
"assd". '8.00, 0 
"apbe". 8.00. 0 
"b&d". 8.00. 0 
"bdsm", 8.00. 0 
"d&s". 8.00. 0 
"motas". 8.00. 0 
"motos". 8.00. 0 
"raotss", 8.00. 0 
"sensual". 0.50, 0 
"sensuality". 0.50. 0 
"lingerie". 4.00. 1 
"panty". 4.00. 1 
"bra". 1.00. 1 
"bras". 1.00. 1 
"marijuana". 0.50. 5 
"underware". 2.00. 1 
"luscious". 2.00. 0 
"intimacy". 0.75. 0 
"intimate". 0.75. 0 
"dominatrix". 2.00. 0 
"dominant". 0.50, 0 
"dominance". 0.50. 0 
"submission". 0.50, 0 
"submissive", 0.50. 0 
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